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The  purpose  of  this  review  of  the  literature  vas  to  examine  the 
appropriateness  of  using  with  women  the  same  selection  and  classifica- 
tion procedures  that  are  used  with  men  in  situations  in  vhich  both  men 
and  women  are  selected  for  the  same  Jobs.  Particular  attention  was 
paid  to  reports  of  the  selection  of  women  for  Jobs  similar  to  billets 
in  the  U.  S.  Navy. 

Two  valuable  sources  of  information  on  women  which  included 
comparable  data  for  men  were  U.  S,  Air  Force  reports  of  the  selection 
of  personnel  for  Air  Force  technical  schools  and  British  reports  of 
the  selection  of  women  for  the  Auxiliary  Territorial  Service  during 
World  War  II. 

Most  industrial  studies  were  based  on  samples  of  one  sex.  It 
became  apparent  that  in  normal  times  most  civilian  Jobs  are  held  predom- 
inantly by  members  of  one  sex  or  the  other,  and  hence  tho  practical 
problem  of  the  Influence  of  sex  differences  on  predictive  measures  has 
not  been  investigated  to  any  extent  by  those  who  are  conducting  person- 
nel research  in  Industry. 

The  findings  tend  to  support  the  assumption  that  tests  developed 
and  used  for  the  selection  of  men  must  be  carefully  examined  prior  to 
their  use  in  selecting  women  for  the  same  Jobs.  This  seems  particularly 
true  for  tests  in  mechanical  and  computational  areas.  In  general, 
findings  seem  to  Indicate  that  a given  tests  score  may  not  predict  the 
same  level  of  on-the-job  performance  for  a woman  as  for  a man,  there 
being  evidence  that  women  do  better  than  would  be  predicted  from  scores 
based  on  tests  and  procedures  developed  for  men. 
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A REVIEW  OF  THE  LITERATURE 


I . Introduction 

During  any  period  of  national  Mobilization  it  «»*»*  lively  that 
women  will  be  called  upon  to  carry  out  a large  mother  of  duties  and  to 
fill  a large  number  of  jobs  ordinarily  assigned  to  men  in  peacetime. 

This  greater  use  of  women  night  be  regarded  by  i*any  persons  as  an.  emer- 
gency measure ; forced  upon  us  by  necessity;  these  persons  might  argue 
that  women  constitute  a labor  group  whose  proficiency  on  many  jobs  is 
generally  inferior  to  that  of  men.  However,  it  1b  difficult  to  recon- 
cile this  viewpoint  with  the  Increasing  peacetime  employment  of  women 
in  non-clerical  positions. 

As  can  be  seen  by  examining  Table  1,  women  have  become  an 
increasingly  larger  part  of  the  labor  force  of  the  United  States  over 
the  past  seventy  years.  Whereas  in  1680  only  15  per  cent  of  persons 
gainfully  employed  were  women,  by  1953  women  made  up  30  per  cent  of  this 
group.  Census  data  indicate  that  in  the  last  quarter  of  a century  the 
distribution  of  women  among  the  various  occupational  groups  has  changed 
steadily.  In  mechanical  and  manufacturing  industries,  for  instance,  the 
proportion  of  the  workers  who  were  womsn  increased  from  13  per  cent  in 
1930  to  19  per  cent  in  1953 • Furthermore,  a greater  proportion  of 
mechanical  and  manufacturing  Jobe  are  today  filled  by  women:  The  per- 
centage of  the  total  number  employed  on  these  jobs  who  were  women  Increased 
from  18  per  cent  in  1950  to  25  per  cent  in  1953* 

The  united  States  lavy  during  World  War  II  found  personnel  of  its 
female  component,  the  WAVES,  to  be  effective  and  competent  in  a consid- 
erable number  of  billets  ranging  from  draftsman  to  disbursing  clerk.  In 
the  selection  and  classification  of  WAVES  personnel,  both  officer  and 
enlisted,  the  tests  that  were  and  are  being  employed  have  largely  been 
tests  developed  for  use  vlth  male  personnel.  That  these  tests  might  not 
be  entirely  appropriate  for  use  with  women  has  been  a matter  of  ooncern 
to  the  Wavy.  In  particular,  it  seemed  necessary  to  examine  the  implica- 
tion of  the  fact  that  women  obtain  lower  scores  on  some  aptitude  tests 
than  men  do,  notably  on  tests  in  the  mechanical  or  quantitative  area. 


Proportion  of  Women  in  the  Total  Croup 
of  Gainfully  Employed  Persons  in  the  United  States 

(Over  10  years  of  ags  between  1880  and  1930 
and  over  l4  years  of  age  thereafter)* 


Year 

Per  Cent 

l880 

14.7 

1890 

17.4 

1900 

16.8 

1910 

23.4 

1920 

21.1 

1930 

22.0 

1940 

24.9 

1950 

28.9 

1953 

30.1 

♦Compiled  from  the  World  Almanac  and  Book  of  Facts: 
New  York:  New  York  World  Telegram  and  Sun.  1935, 
p.  314;  3945,  p.  5^6;  1951,  P*  582;  1954,  P-  26l. 
Primary  Source:  Bureau  of  the  Census. 


(In  order  sou  to  leave  the  impression  that  women  generally  make  lower 
scores  on  tests  than  men,  it  should  be  mentioned  that  women  do  as  well, 
if  not  better,  than  men  on  verbal  and  clerical  tests.) 

In  the  spring  of  1953  the  Bureau  of  Naval  Personnel,  through 
contract  between  the  Office  of  Naval  Eesearch  and  the  Educational  Testing 
Sfarvice,  arranged  to  have  a project  carried  out  that  would  deal  with  the 
eippropr iatenes s of  existing  selection  and  classification  tests  for  use 
with  eniisted  women  in  the  Navy.  As  one  phase  of  this  project  there  was 
to  be  a critical  review  of  the  psychological  literature  on  the  measure- 
ment and  evaluation  of  aptitudes  and  skills  of  women  both  in  military  and 
in  industrial  settings  comparable  to  those  to  which  enlisted  women  in 


^ ' . The  most  relevant  evidence  for  evaluating  the  appropriateness  of 

selection  and  classification  tests  for  use  with  women  is  provided  by  the 
relationship  between  scores  on  the  test  and  a suitable  assessment  of 
performance  on  the  job.  (Since  the  latter  constitutes  the  pay-off,  we 
shall  refer  to  It  as  the  "criterion. n)  Inasmuch  as  most  of  the  existing 
Navy  tests  were  developed  for  men  and  have  been  used  with  women  In  the 
same  way  as  with  men,  the  criterion-test  relationship  for  women  should 
be  compared  with  that  for  men  In  order  to  arrive  at  conclusions  regarding 
the  appropriateness  of  the  test  procedures  for  use  with  women. 

As  has  been  mentioned,  on  some  tests  enlisted  women  In  the  Navy 
have  on  the  average  scored  significantly  lower  than  have  enlisted  men. 
This  finding  taken  alone  Is  ambiguous  In  meaning . The  significance  of  a 
given  score  may  or  may  not  be  the  same  for  a woman  as  for  a man. 


A few  diagrams  may  serve  to  illustrate  this  point.  In  the  first 
three  of  these,  various  possible  relationships  between  the  criterion  and 
the  test  score  have  been  depicted.  In  each  case  the  mean  score  observed 
for  women  Is  taken  to  be  lower  than  that  for  men.  In  Figure  1,  the 
regression  of  orlterlon  performance  on  test  score  has  been  Illustrated  as 
being  the  ssjsb  for  women  as  for  men.  This  Illustration  would  correspond 
to  (and  Justify)  the  practice  of  treating  a given  score  as  having  identical 
meaning  for  a woman  as  for  a man.  According  to  this  representation  women 
score  lower  than  men  on  the  test  --  and  they  do  not,  on  the  average,  per- 
form as  well  on  the  Job.  If  actual  evidence  from  studies  could  be 
found  to  fit  this  model,  such  evidence  would  support  use  of  the  same  tests 
with  both  sexes,  with  the  same  cutting  score  being  appropriate  for  both. 

But  next  consider  Figure  2.  Here,  women  are  again  shown  as 
generally  scoring  lower  on  the  test  than  men,  but  this  time  they  are 
represented  as,  on  the  average,  performing  as  well  on  the  Job  as  do  the 


men.  A given  score  does  not.  In  this  model,  have  the  same 


jlicatlon 


for  a woman  that  It  has  for  a man.  While  the  test  is  effective  for 
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Identifying  the  better  and  poorer  prospects  in  each  group,  for  ade- 
quate interpretation  of  a given  score  the  sex  of  the  recruit  must  be 
taken  into  account.  A recruit  with  a score  of  X points  on  the  test  will 
be  most  likely  to  have  a criterion  performance  of  W if  a female  and  M 
if  a male.  Operationally,  if  evidence  supported  this  model,  different 
procedures  would  be  required  for  women  than  for  men.  One  cutting  score 
for  women  and  another  for  men  would  probably  be  the  simplest  way  of 
taking  the  sex  of  the  recruit  into  account  in  such  a situation. 

In  constructing  Figure  2,  the  relationship  between  criterion  and 
test  was  taken  to  be  similar  for  the  two  groups,  only  the  means  of  the 
test  scores  being  represented  as  different.  In  Figure  3,  on  the  other 
hand,  a more  complex  difference  is  illustrated.  In  this  figure  the 
regression  of  criterion  performance  on  test  score  is  pictured  as 
different  for  women  than  for  men.  Both  the  slopes  of  the  regression 
lines  and  the  errors  of  estimate  are  also  taken  as  different.  Evidence 
supporting  this  model  as  being  correct  would  again  point  to  the  desira- 
bility of  different  procedures  for  women  than  for  men.  However,  this 
situation  is  more  complicated  than  that  represented  in  Figure  2.  Women 
do  store  poorly  than  men  on  the  test,  but  on  the  average  do  as  well  on 
the  Job,  as  was  also  the  case  in  Figure  2.  However,  in  the  representa- 
tion in  Figure  3,  a given  increase  in  test  score  is  associated  with  a 
greater  Jump  in  criterion  performance  for  women  than  for  men.  (For  a 
score  of  X points,  the  most  likely  criterion  performance  is  W for  women 
and  M for  men.  For  a score  of  T,  these  performances  are  W'  and  M', 
respectively.  Note  that  the  Increase  from  V to  V is  greater  than  the 
increase  from  M to  M * . ) 

If  only  this  single  predictor  is  involved  in  the  acceptance  or 
rejection  of  a recruit  for  the  particular  assignment,  again  a different 
cutting  score  for  women  than  men  would  be  a simple  means  of  handling 
the  situation.  However,  if  scores  on  the  test  are  to  be  combined  with 
other  scores,  then  the  model  represented  in  Figure  3 will  require  more 
complex  treatment  than  will  the  model  in  Figure  2.  In  particular,  if 
linear  composites  are  to  be  obtained  through  use  of  multiple  regression 
equations,  separate  multipliers  or  regression  weights  will  be  needed  for 
men  and  women  fjr  a given  test. 


The  fourth  figure  illustrates  still  another  possibility  for 


which  evidence  might  come  to  light  in  a survey  of  the  literature.  Per- 
haps there  exist  other  predictive  measures  for  each  of  which  the  scatter- 
plot  of  criterion  performance  on  test  score  will  be  the  same  for  women 
...  as  for  men.  If  such  measures  could  be  found,  they  would  have  distinct 

advantages  over  present  measures  in  terms  of  ease  of  interpretation  and 
| application. 

Having  discussed  several  possible  models  which  available  evidence 

i 

might  fit,  it  is  now  appropriate  to  turn  to  the  review  proper  to  see 
what  data  have  in  fact  been  published,  and  how  these  data  might  Influence 
the  course  of  further  development  of  predictive  measures  to  be  used  both 
with  men  and  women. 


At  first  it  was  hoped  that  this  report  might  be  devoted  largely 
to  the  review  of  published  articles  reporting  studies  in  which  both  men 
and  women 

(a)  had  been  tested  with  the  same  tests  (the  tests  being 
similar  to  Navy  tests), 

(b)  had  worked  side  by  side  on  the  same  Jobs  (the  Jobs  being 
similar  to  Navy  billets),  and 

(c)  had  been  evaluated  with  the  same  measures  and  techniques 
for  assessing  performance. 


In  reading  the  next  section  of  the  review  it  will  be  observed  that  only 
a few  reports  could  be  found  which  fit  this  description  even  loosely. 


Although  many  factory  Jobs  aro  performed  by  both  men  and  women, 
there  seemingly  has  been  little  Interest  in  a comparative  analysis  of 
the  relation  between  on-the-job  performance  and  selection  test  scores 
for  men  and  women.  This  is  true,  at  least,  of  published  research.  The 
experimental  populations  Involved  in  most  reported  studies  consist 
entirely  of  members  of  one  sex.  In  many  cases  the  selection  measures 
utilized  are  ones  that  have  been  standardized  on  populations  of  one  sex. 
Very  often  the  same  cutting  scores  are  used  quite  uncritically  for  select- 
ing workers  of  both  sexes. 
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Rather  than  confine  the  attention  of  the  report  to  the  few 
studies  which  fit  the  specifications  listed  above,  the  scope  of  the 
report  was  expanded  to  include  studies  describing  situations  in  which 
women  had  been  selected  for  employment  on  jobs  similar  to  billets  to 
which  women  might  be  assigned  in  the  Navy,  both  (l)  with  tests  similar 
to  those  in  the  Navy  basic  test  battery  and  (2)  with  other  tests,  even 
when  comparable  data  for  men  were  not  presented.  In  considering  the 
appropriateness  for  use  with  women  of  present  Navy  tests,  one  quite 
properly  can  raise  the  questions  of  whether  tests  like  the  ones  being 
used  do  predict  success  on  the  job,  and  of  whether  there  are  other  tests 
which  might  be  preferable  from  a predictive  viewpoint.  Furthermore,  a 
presentation  of  what  has  been  published  along  these  lines  may  point  out 
the  gaps  which  other  research  workers  may  then  fill. 

Another  extension  of  the  coverage  was  the  inclusion  of  a review 
of  seme  reports  of  sex  differences  on  tests  similar  to  tests  in  the 
Navy  basic  battery.  Reported  sex  differences,  of  course,  must  be 
regarded  as  evidence  of  need  for  further  information.  The  pertinence 
of  such  reports  here  is  that  they  call  attention  to  the  need  for  addi- 
tional data  on  these  Navy-like  tests  so  that  the  appropriate  way  to 
employ  and  interpret  scores  on  such  tests  may  become  clear. 

Contents  of  Later  Sections  of  the  Report.  In  Section  II  brief  descrip- 
tions of  Navy  tests  used  with  enlisted  women  are  presented.  Sections  III 
and  IV  present  reviews  of  two  major  contributions  to  the  literature  on 
appropriateness  of  selection  and  classification  measures  used  with  both 
men  and  women. 

Sections  V and  VI  contain  reviews  of  studies  in  which  women 
have  been  selected  for  Jobs  similar  to  those  in  the  Navy  and  the  effec- 
tiveness of  the  tests  has  been  ascertained  for  predicting  success  on  the 
job.  The  studies  covered  herein  differ  from  those  reviewed  in  Sections 
III  and  IV  in  that  no  comparable  data  are  presented  for  men. 

Section  VII  contains  a review  of  reported  studies  in  which  sex 
differences  have  been  observed  on  tests  similar  to  those  in  the  Navy 
basic  test  battery.  Studies  included  in  this  section  did  not  present 
follow-up  validity  data  on  the  tests . 


..&&***&*-  wST*>**  1 


The  final  section.  Section  VIII,  attcmps  to  EBBeBB  the  significance 
of  the  literature  surveyed,  from  the  standpoint  of  implications  for  the 
appropriateness  of  selection  and  classification  tests  for  UBe  with 
enlisted  women  in  the  Navy. 

With  the  exception  of  the  British  work  and  of  the  U.  S.  Air  Force 
studies  reported  in  the  following  two  sections,  most  of  the  Btudies 
reviewed  have  been  modest  in  scope,  carried  out  by  a single  research 
worker  or  small  group  of  workers  over  a short  period  of  time.  The  alms 
have  been  practical,  resulting  in  attempts  to  <*olve  pressing  problems  of 
the  moment.  Consequently,  the  studies  vary  greatly  in  scope,  method, 
choice  of  criteria,  and  subjects . Comparisons  between  them  tend  to  be 
difficult  as  well  as  dangerous.  Research  deBignB  vary  in  quality,  and 
although  techniques  have  improved  in  recent  yearB,  authors  frequently 
neglected  to  describe  fully  the  methods  that  were  used. 
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II.  Navy  Tests  Administered  to  Enlisted  Woman 

Since  a major  focus  of  attention  In  this  report  Is  the  tests 
used  by  the  Navy  for  the  selection  and  classification  of  enlisted  women. 

It  Is  quite  appropriate  to  present  brief  descriptions  of  these  tests.  As 
findings  from  various  military  and  Industrial  situations  are  presented, 
the  reader  nay  then  be  better  able  to  note  to  vhat  extent  the  results 
cited  are  for  tests  similar  to  Navy  tests  or  are  for  tests  somewhat 
different  from  these. 

There  sure  four  tests  In  Form  5 of  the  Navy  Basic  Test  Battery. 

These  are  (l)  the  General  Classification  Test,  (2)  the  Arithmetic  Test, 

(3)  the  Mechanical  Test,  and  (4)  the  Clerical  Aptitude  Test.  Bach  test 
is  briefly  described  below.  (The  Items  provided  as  illustrations  are 
similar  to  those  In  the  tests,  but  of  course  are  not  actual  operational 
Items. ) 

1.  General  Classification  Test.  This  is  a test  composed  of  verbal 
analogies  and  sentence  completion  Items.  Items  resemble  the  following: 

PAGE  is  to  book  as  TREE  is  to 

(l)  lumber  (2)  forest  (3)  paper  (4)  farm 

If  the  radar  Indicates  the  approach  of  ene^y  aircraft, 

a general  will  be  sounded. 

(l)  barrage  (2)  control  (3)  observation 
(4)  battle  (5)  warning 

2.  Arithmetic  Test  • itoao  in  Part  I of  this  test  call  for  the  addition, 
subtract ion,  multiplication,  and  division  of  integers  and  fractions. 
Sosa  Items  also  Involve  the  use  of  decimals  end  percentages. 

The  second  part  of  the  test  is  a meaeure  of  arithmetic  reasoning. 
Items  are  of  the  following  type: 

If  a men  walks  a mile  In  20  minutes,  how  many  miles  will 
he  walk  in  two  hours  at  this  pace? 

(A)  2 (3)  3 (C)  6 (D)  8 (E)  10 

3.  Ifechenlcal  Test.  This  test  consists  of  two  parts.  The  first  part 
is  a test  of  mechanical  end  electrical  knowledge;  the  second,  a test 
of  mechanical  comprehension.  Both  parts  are  in  pictorial  form. 

In  Part  I each  item  involves  four  pictured  objects.  These  may  be 
tools,  materials  used  in  construction,  electrical  devices  and  components. 
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and  the  like.  The  examinee  is  called  upon  to  state  which  one  of  three  of 
the  pictured  objects  Is  associated  most  closely  with  the  first  or  stimulus 
object.  Items  resemble  the  following: 


The  mechanical  comprehension  part  of  the  test  Is  similar  to  Bennett's 
Test  of  Mechanical  Comprehension  (19).  An  Illustration  of  the  Item  type 
is  given  below : „ 


r Which  of  the 

A 6.  B jP  gears  turn  In  a 

^~rCr^  direction  opposite 

q ^ Y to  that  of  the 

p driver? 

Driver  (A)  Gear  A 

(B)  Gear  B 

(C)  Both  gears 

1*.  Clerical  Aptitude  Test . Form  5A  of  this  test  consists  of  two  parts, 
name  checking  and  number  checking.  Content  of  this  Navy  test  Is 
similar  to  that  of  the  Minnesota  Clerical  Test  (5).  The  National 
Institute  for  Industrial  Psychology  (Nllf)  Group  Test  20  (Checking) 
(131)  Is  similar  to  the  second  part  of  the  test.  Another  test  used 
In  Industrial  situations,  the  Hay  Number  Perception  Test  also 
resembles  part  of  the  Navy  test. 

In  the  naaa  checking  part,  the  examinee  Is  called  upon  to  com- 
pare names  presented  In  tvo  columns  and,  as  rapidly  as  he  can,  to 


- v '*fv v 


I 


• r - *pr+’+-\ 


Indicate  whether  these  names  are  the  ease  or  different.  An  illus- 


tration is 


8 D 


Pioneer  Forester  Company  ”””  ~~~  Pioneer  Forestry  Company 

8 D 

Benson's  Refinery  Works  ___  _ Benson's  Bebindery  Works 

The  nuaber  checking  part  is  arranged  in  a similar  manner.  An 
illustration  of  the  item  type  is  the  following: 

8 D 

473926  473926 


8 D 


3200528  "■  320528 


'Ort' 


ti 

fi 


I 


i 
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j 
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III . Selection  and  Classification  of  Women  In  the  British  Armed  Forces 


Studies  reporting  the  occupational  selection  of  vomen  vlth 
comparable  data  for  men  have  arisen  largely  out  of  situations  in  which 
the  problems  are  similar  to  those  faced  by  the  Navy.  Beports  of  the 
selection  of  women  In  the  British  forces  and  In  the  U.  S.  Air  Force  were 
found  to  provide  the  most  valuable  sources  of  Information.  ThiB  section 
will  present  a review  of  British  reports,  and  the  following  section  will 
be  concerned  with  the  U.  S.  Air  Force  studies. 

Beports  on  the  selection  procedures  used  by  the  British  during 
World  War  II  Indicate  that  before  the  end  of  the  war  over  200,000  women 
had  been  selected  for  a total  of  106  Jobs  In  the  Auxiliary  Territorial 
Service  (ATS).  Boutine  written  reports  of  buccbbb  in  training  apparently 
were  available  for  39;  000  of  these  women  who  entered  the  ATS  between 
October  19^2  and  September  19^3*  Special  reports  were  prepared  on  all 
auxiliaries  falling  in  the  training  courses.  In  addition,  some  measure 
of  on-the-job  proficiency  was  obtained  for  a total  of  5000  auxiliaries 
who  were  followed  up  in  27  particular  trades  and  employments  by  me  ana  of 
visiting,  collecting  examination  results,  and  holding  rating  conferences 
on  proficiency. 

In  view  of  the  large  number  of  women  selected,  the  variety  of 
jobs  performed,  and  the  opportunity  for  follow-ups  beyond  the  training 
period  to  actual  performance  on  the  job,  it  waB  at  first  expected  that 
this  material  would  provide  answers  to  many  of  the  questions  that  led  to 
the  preparation  of  this  review.  ThiB  did  not  prove  to  be  so,  for,  as 
Vernon  (183)  has  pointed  out,  the  exigencies  of  war  created  an  extreme 
shortage  of  trained  staff,  of  Personnel  Selection  Officers  familiar  with 
statistical  methods,  and  of  automatic  calculating  machines,  and,  further, 
prevented  any  planned  experimental  approach.  As  a result,  comparisons  of 
the  British  data  with  data  produced  in  this  country  tend  to  be  difficult. 

In  the  British  reports  published  since  the  war,  criteria  have 
seldom  been  explicitly  stated.  This  may  be  due  in  part  to  the  fact  that 
they  seem  to  have  varied  from  situation  to  situation.  During  wartime, 
characteristics  of  score  distributions  were  reported  in  terms  of  medians 
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and  percentiles  because  these  statistics  were  considered  to  be  more 
easily  interpreted  by  the  inexperienced  personnel  who  used  them,  even 
though,  as  Vernon  (183)  noted,  these  distributions  were  frequently 
skewed.  Problems  are  further  complicated  for  the  American  reader  as 
Job  descriptions  were  not  supplied,  and  there  is  reason  to  believe  that 
similar  titles  did  not  always  imply  similar  responsibilities. 

In  some  cases  the  tests  in  use  with  women  differed  from  those  in 
use  with  men,  even  at  the  outset.  In  other  cases,  identical  tests  were 
weighted  differently  for  women  than  for  men.  Validity  coefficients 
were  reported  for  Job  groupings  rather  than  for  single  Jobs,  and 
frequently  these  groupings  do  not  correspond  to  Job  groupings  in  U.  S. 
women's  forces  for  which  corresponding  data  are  available. 

Selection  for  Jobs  in  the  ATS  (as  in  other  branches  of  the 
British  services)  was  not  based  entirely  on  test  scores.  Interviews 
and  a "Qualification  Form"  (biographical  inventory)  supplemented  test 
information.  Vernon  (l84)  stated  that  no  evidence  was  shewn  that 
objective  testing  could  entirely  replace  the  interview.  Vernon  and 
Parry  (l86,  p.  287)  indicated  a belief  that  the  interview  was  "essential 
on  the  grounds  of  flexibility  and  humanity,  in  spite  of  Its  inaccuracy." 

The  test  battery  used  for  selection  and  classification  varied 
from  one  branch  of  the  service  to  another.  While  no  attempt  will  be 
made  to  describe  all  the  tests  in  detail,  a brief  description  of  measures 
referred  to  in  this  report  is  given  below.  For  a fuller  account  the 
reader  should  consult  Vernon  and  Parry  ( 186 ) . 

The  Progressive  Matrices  Test  was  adopted  as  the  primary 
general  intelligence  test  in  the  Royal  Navy,  Army,  and  ATS  in  19^1.  It 
is  a non-verbal  intelligence  test.  Each  item  requires  the  subject  to 
Induce  relationships  among  geometric  figures  in  a matrix  in  order  to 
select  a figure  which  completes  the  pattern  (l86,  p.  2jk) . 

The  Instructions  Test  was  a clerical  test  which  required  the 
examinee  to  perform  the  operations  of  checking,  filing,  classifying,  and 
coding  printed  information  in  rapid  rotation  (186,  p.  222). 
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The  Spelling  Teat  used  in  the  ATS  required  the  examinee  to 
choose  the  one  correct  spelling  of  a word  from  a list  of  six  different 
spellings.  A synonym  at  the  beginning  of  each  line  helped  the  examinee 
to  identify  the  word  (18 6,  p.  227). 

The  Arithmetic  and  Mathematics  Tests  used  typically  contained 
two  parts:  the  first  consisting  of  straightforward  addition,  subtraction, 
multiplication,  and  division  problems,  and  the  second  including  thirty 
forty  brief  problems  (l86,  p.  228).  These  tests  were  considerably  simpli- 
fied for  use  in  the  ATS. 

The  Spatial  Teat  used  most  widely  in  the  British  Navy,  Army 
and  ATS  was  tb*  WTTP  Squares  Test.  This  consisted  of  a series  of  fifty 
figures  in  each  of  which  the  examinee  was  required  to  draw  a dividing 
line  such  that  the  two  pieces  so  formed  would,  if  turned  around,  make  a 
square  (186,  p.  23 6). 

Bennett's  Test  of  Mechanical  Comprehension  was  one  of  several 
mechanical  tests  used  in  the  Navy,  Army  and  ATS.  In  the  ATS  a Practical 
Problems  Test  roughly  comparable  to  Form  W1  of  the  Bennett  test  was  also 
introduced  (l86,  p.  24l).  (Form  W1  is  a form  designed  for  women.) 

A Meccano  Assembly  Test  was  used  with  great  success  in  the  ATS 
as  a supplement  to  the  Bennett  (l86,  p.  242).  (Meccano  Is  a construction 
set  consisting  of  punched  strips,  wheels,  gears,  pulleys,  and  plates,  out 
of  which  it  is  possible  to  assemble  operating  models  of  many  mechanical 
devices.)  In  the  Any  a similar  but  more  difficult  assembly  test  was 
used. 

A considerable  degree  of  success  was  reported  for  the  selection 
procedures  used  in  the  ATS  (186,  p.  48).  Wickham  (191)  provided  correla- 
tions between  scores  on  six  tests  of  the  ATS  battery  and  the  criterion 
of  success  in  training  for  27  ATS  Jobs.  Beta  weights  and  multiple 
correlations  are  also  cited.  A summary  of  these  data  is  given  in  Table 
2.  On-the-job  validity  data  are  presented  in  Table  3 for  twelve  Job 
groupings  in  the  ATS.  The  number' of  cases  on  which  correlations  were 
based  varied  from  30  in  some  samples  to  1128  in  the  largest  sample;  the 
median  sample  size  was  106  (186,  p.  210;  191,  Table  4).  In  another  report 
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Mercer  (128)  observed  that  multiple  correlations  (corrected,  presumably 
• for  restriction  of  range)  for  the  test  battery  vlth  success  In  27 

particular  trades  varied  from  Al  to  .96  with  a clustering  around  .70. 

i 

Some  information  vhlch  might  be  used  as  a gauge  of  the  relative 
| effectiveness  of  test  batteries  used  In  the  ATS  and  other  British 

•I  services  was  given  by  Vernon  and  Parry  (186,  p.  212).  The  results  for 

the  principal  tests  are  suanarlzed  in  Tables  k and  5>  which  present  not 
1 only  the  median  validity  coefficients  In  all  comparable  Naval  and  ATS 

studies,  but  also  provide  a notion  of  the  range  of  the  observed  coefficients. 
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Mercer  (128)  indicated  that,  of  the  39,000  auxiliaries  pasjing 
through  selection  procedures  and  undergoing  training  for  specific  ATS 
trades  and  employment,  ov^r  9^  per  cent  were  successful  In  training. 

Mercer  also  provided  figures,  sumarlzed  In  Table  6,  showing  the  decrease 
In  failure  rates  following  the  Introduction  of  a selection  program. 

Vernou  amd  Peirry  (186,  p.  120)  reported  training  failure  rates  of  10,000 
Army  tradesmen  selected  by  four  different  methods  In  19^2  and  trained 
simultaneously.  These  rates  varied  from  27. 1 per  cent  for  men  called  up 
by  the  Ministry  of  Labour  as  semi -qualified  tradesmen  to  17  per  cent  for 
men  nominated  by  the  commanding  officer  or  other  technical  officer,  to 
l6.6  per  cent  for  men  nominated  at  their  own  request,  and  8.7  per  cent 
for  men  selected  by  Personnel  Selection  Officers.  Similarly,  Vernon 
and  Parry  (186,  p.  121 ) Indicated  that  failure  rates  for  men  trained 
as  Fleet  Air  Arm  mechanics  amd  fitters  fell  from  an  average  of  lU.7  per 
cent  to  4.7  per  cent  with  the  introduction  of  the  new  selection  methods. 

From  the  findings  cited  above,  it  seeas  reasonable  to  assert 
that  the  effectiveness  of  the  selection  procedures  in  use  in  the  ATS  was 
comparable  to  that  of  the  procedures  in  use  In  the  Army  amd  the  Boyal 
Navy  insofar  as  It  is  safe  to  assume  that  criteria  of  success  and  failure 
remained  stable  with  pamsage  of  time  amd  from  service  to  service. 

The  rather  high  validities  reported  for  the  tests  In  the  ATS 
battery  may  be  accounted  for  in  part  by  the  fact  that  no  attempt  was 
made  to  use  with  women  the  same  tests  and  weights  used  for  selecting 
men  for  similar  Jobs.  Tests  were  stamdardlzed,  and  weights  were  established 
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Table  4 


Descriptive  Statistics  of  the  Distributions  of  Bav  Validity  Coefficients 
and  Multiple  E for  Standard  Naval  Selection  Tests 

Adapted  from  Vernon  and  Parry  (l86,  p.  21?) 


Statistic 


Tests 


Multiple  B 


Pro- 

gressive 

Matrices 

Shipley 

Abstrac- 

tion 

Bennett 

Msch. 

Comp. 

Mathe- 

matics 

Square b 

T2* 

(uncor- 

rected) 

* 

1 

i 

j 

90th  per- 
centile 

.1*5 

.49 

.44 

• 57 

• 38 

• 57 

.70 

1 

! 

1 

50th  per- 
centile 

.28 

•30 

.28 

• 35 

.22 

.40 

.47 

1 

10th  per- 
centile 

.10 

.11 

• 13 

.17 

• 05 

.20 

• 32 

*T2  refers  to  Total  Score  (with  Test  1 doubled)  on  the  Eoyal  Navy's 
standard  test  battery.  The  battery  consisted  of  four  group  tests: 
1.  Modified  Shipley  Abstraction,  2.  Modified  Bennett  Mechanical 
Comprehension,  3s-  Arithmetic,  3b.  Mathematics,  and  4.  Squares 
Test  of  Spatial  Judgment. 


Table  5 


,j  Descriptive  Statistics  of  the  Distributions  of  Validity  Coefficients 

: Corrected  for  Multivariate  Selectivity  and  of  Multiple  Correlation 

Coefficients  for  Basic  Selection  Tests  in  the  ATS 

Adapted  from  Vernon  and  Parry  (l86,  p.  212) 


t > 

Statistic  Tests  Multiple  R 


Pro- 

gressive 

Matrices 

Bennett 

Mach. 

Comp. 

Arith- 

metic 

Squares 

Cler- 

ical 

uncor- 

rected 

cor- 

rected 

90th  per- 
centile 

.65 

.Jn 

.69 

• 51 

.66 

.69 

.6k 

50th  per- 
centile 

.30 

• 51 

AO 

.56 

A7 

.65 

10th  per- 
centile 

.27 

.19 

.26 

.20 

•35 

•35 

.50 

Table  6 


Failure  Rates  in  Four  ATS  Job  Groups  Prior  to  and  Following 
Introduction  of  Selection  Procedures 

Adapted  from  Mercer  (128,  p.  196) 


Pre - Intr oduc ti on 

Post-Introduction 

Number 

Failure 

Number 

Failure 

Job 

Selected 

Rate 

Selected 

Rate 

Cle rks 

128 

11# 

592 

4# 

Drivers 

124 

30# 

1004 

14# 

Special  Operators 

420 

61# 

130 

8# 

Operators : Wireless 
and  Line 

217 

7# 

187 

0.9# 

-22- 

for  use  with  women,  through  studies  of  ATS  personnel.  Although  this 
procedure  may  indicate  excellent  Judgment  on  the  part  of  those  responsible, 
it  complicates  the  task  of  analysis  undertaken  here.  A simpler  arith- 
metic test  was  devised  for  the  ATS  than  was  used  with,  men,  since  women 
seemed  to  have  difficulty  in  handling  decimals.  In  selecting  mechanics, 
a modified  Bennett  test  was  supplemented  by  a Meccano  Assembly  Test. 

Vernon  (185)  reported  that  this  combination  of  the  Bennett  and  the  assembly 
test  showed  greater  differentiation  in  selecting  women  for  mechanical  jobs 
than  did  the  Bennett,  used  alone,  in  selecting  men  for  similar  Jobs.  He 
concluded  elsewhere  (l84)  that  tests  of  mechanical  comprehension  were 
acceptable  for  use  with  women  and  adolescents  who  had  little  previous 
experience,  but  that  for  adult  males,  straightforward  information  tests 
were  more  promising. 

Vernon  further  stated  that  mathematics  tests  seemed  to  give 
higher  correlations  with  proficiency  at  the  end  of  training  for  mechanical 
jobs  than  did  any  of  tne  mechanical  tests.  In  the  absence  of  any  informa- 
tion about  methods  of  selecting  the  samples  upon  which  these  conclusions 
were  based,  these  assertions  should  perhaps  be  qualified.  As  Vernon  him- 
self pointed  out  in  other  discussions,  validity  figures  for  any  particular 
selection  variable  suffer  to  the  extent  to  which  personnel  have  been 
selected  on  that  particular  variable.  As  a result,  some  other  teat, 
playing  no  role  in  the  original  selection,  may  achieve  an  apparently 
closer  relationship  to  the  criterion  variable  than  is  observed  for  the 
variable  on  which  selection  has  occurred.  Vernon  and  Parry  (186,  p.  213) 
reported  that  despite  the  fact  that  for  soma  jobs  all  of  the  tests  used 
yielded  high  validity  coefficients,  and  for  other  jobs  all  coefficients 
were  low,  "the  relative  validities  of  the  different  tests  were  remarkably 

uniform." 

According  to  Vernon  and  Parry,  he  tests  were  mainly  used  to 
distribute  the  available  supplies  of  high  quality  personnel  among  the 
different  branches  according  to  their  needs.  Differentiation  between  jobs 
was  based  more  on  interests  and  interview  judgments  than  on  test  scores. 
(186,  p.  215).  Vernon  and  Parry  speculated  upon  the  reasons  for  the  ten- 
dency of  the  "verbal;  educational"  tests  to  correlate  in  general  more 
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highly  with  measures  of  proficiency  than  any  of  the  other  tests  administered. 
They  offered  several  possible  explanations  for  these  findings  (l86,  p.  215- 
21b):  the  high  reliability  of  verbal  and  clerical  tests;  the  high  g- 
saturations  of  these  tests;  the  fact  that  these  tests  possibly  involved 
measures  of  certain  personality  or  temperamental  qualities  important  to 
vocational  success;  the  extreme  heterogeneity  of  the  samples  in  respect 
to  g;  and  tendency  of  the  Jobs  to  be  more  varied  than  most  civilian  employ- 
ments. These  authors  also  suggested  that  if  more  objective  measures  had 
been  available  as  training  criteria,  and  if  training  criteria  had  been 
supplemented  with  assessments  of  operational  efficiency,  the  "verbal: 
educational"  tests  might  have  yielded  lower  correlations,  and  specialized 
tests  higher  correlations  with  the  criteria.  Finally,  the  authors 
considered  that  had  there  been  more  time  available,  tests  more  successful 
than  these  might  have  been  found. 

In  order  to  check  on  the  possibility  that  the  choice  of  training 
grades  as  criteria  led  to  the  high  validities  of  tne  "verbal:  educational" 
tests,  Vernon  and  Parry  presented  some  findings  based  on  operational 
follow-ups.  In  one  study,  assessments  of  efficiency  during  fighting  in 
Italy  were  collected  for  200  Royal  Marine  signallers  (l86,  p.  2l6).  Naval 
selection  tests  were  administered  to  the  men  after  they  returned  to 
Britain.  The  T2  scores  on  the  standard  Royal  Navy  Battery  (see  footnote, 
Table  t)  correlated  .62,  and  component  tests  between  . 1*6  and  .1*9,  with 
measures  of  efficiency.  It  can  be  seen  by  referring  to  Table  1*  that  the 
validities  of  T2  were  as  high  or  higher  when  efficiency  during  fighting 
served  as  the  criterion  as  when  training  grades  were  used.  In  another 
study  (l86,  p.  217),  over  six  hundred  trainees  for  anti-aircraft  duties 
in  the  ATS  were  followed  up,  and  later  some  thirteen  hundred  women  were 
assessed  for  efficiency  after  serving  two  or  more  years.  The  average 
validity  coefficients  corrected  for  selectivity  are  shown  for  both  groups 
in  Table  7- 

Brown  and  Ghiselli  (36)  compared  for  a number  of  tests  given  to 
several  Job  groups  the  validities  against  training  criteria  with  on-the- 
Job  validities.  They  obtained  127  pairs  of  such  validities  from  research 
reported  in  the  literature.  These  authors  differed  from  Vernon  and  Parry 


* 

i 


(l86)  claiming  that  the  relationship  between  validities  obtained  in 
training  situations  and  on  the  Job  tends  to  be  low;  Brown  and  Ghiselll 
did  state,  however,  that  the  relationship  between  the  two  kinds  of 
validity  figure  was  higher  for  intelligence  and  clerical  tests  than  for 
tests  of  other  kinds.  They  also  indicated  that  the  relationship  between 
the  two  types  of  validity  was  highest  for  studies  involving  "manipulative 
and  observational"  jobs. 


Table  7 

Mean  Validities  (corrected  for  selectivity)  of  A.T.S.  Selection 
Tests  in  Several  Anti-aircraft  Jobs  at  Different  Stages 

Adapted  from  Vernon  and  Parry  (l86,  p.  217) 


Tests 


Pro-  Bennett 

gressive  Mech.  Arith-  Cler-  Spell- 

Matrices  Comp.  metic  Squares  leal lng 


Training  .47  .24  .53  .11  .42  . 06  .65 

stage 

(N  - 600+) 

Operational  .35  .21  .30  .25  .37  .31  .43 

stage 

(N  - 13004;) 
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IV.  Performance  of  Woman  In  Air  Force  Technical  Schools 


An  opportunity  for  malting  comparisons  "between  the  performance  of 
men  and  women  on  the  same  tests  and  in  the  same  training  situations  has 
been  provided  by  some  Air  Force  research.  A study  by  Howard  and  Pickrel 
(97)  compared  samples  of  women  in  the  Air  Force  (WAF)  with  samples  of 
mala  airmen  in  seven  Air  Force  technical  schools.  Aptitude  indices  from 
the  Airman  Classification  Battery  and  final  course  grades  in  these 
schools  were  studied  for  all  samples  (see  Table  8).  The  seven  courses 
provided  training  for  clerk-typists,  radar  operators,  radio  mechanics 
(general),  radio  operators,  supply  technicians,  teletype  operators,  and 
weather  observers. 

Howard  and  Pickrel  vere  primarily  interested  in  investigating 
the  relative  validities  of  the  battery  for  men  and  women;  that  is,  they 
wished!  to  determine  whether  the  battery  would  predict  the  success  of  the 
women  sufficiently  well  to  warrant  its  use  in  the  selection  of  women  for 
Air  Force  technical  schools.  The  authors  discovered  that  the  validities 
of  the  aptitude  indices  were  generally  lower  for  WAF  samples  tnan  for 
samples  of  male  airmen  (see  Table  8),  but  concluded  that  the  differences 
were  slight  and  that  the  aptitude  indices  from  the  Airman  Classification 
Battery  predicted  grades  sufficiently  veil  to  Justify  their  continued 
use  in  selecting  both  WAF  and  male  airmen  for  technical  schools. 

It  is  interesting  to  note,  however,  that  in  each  of  the  seven 
schools  the  mean  final  grade  for  the  women  exceeded  that  for  the  men. 

This  is  true,  for  instance,  for  radio  mechanics,  radio  operators,  and 
weather  observers,  in  spite  of  the  fact  that  the  mean  score  on  the  selec- 
tion variable  was  lower  in  these  cases  for  the  women  (see  Table  8). 

These  facts  bear  important  implications  in  that  the  exact  nature  of  the 
relationship  between  aptitude  scores  and  course  grades  is  different  for 
women  than  for  men;  that  is,  the  final  course  grade  predicted  from  any 
given  Bcore  on  the  Airman  Classification  Battery  will  differ  for  WAF 
from  that  predicted  for  male  airmen. 
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Mary  Agnes  Gordon  (76)  carried  the  analysis  one  step  further  in 
an  attempt  to  determine  the  feasibility  of  using  the  same  minimum  quali- 
fying scores  for  women  as  for  men  in  selecting  students  for  Air  Force 
technical  schools.  By  studying  the  regression  of  final  school  grades  on 
scores  on  the  selection  measures,  Gordon  provided  the  basis  for  a some- 
what more  sensitive  analysis  of  the  reiat  iou  ucoVqou  the  selection 
variables  and  the  criterion  than  had  hitherto  been  attempted. 

Gordon  reported  that  the  regressions  of  final  grades  on  aptitude 
scores  for  the  WAF  group  differed  sufficiently  in  every  technical  school 
sample  from  the  regression  for  the  white  male  group  to  justify  i,he  con- 
clusion that  "the  same  aptitude  index  will  not  predict  the  same  final 
school  grade  for  these  groups"  (78,  p.  6).  The  regressions  for  WAF  dif- 
fered from  those  for  white  males  at  the  .01  level  in  all  but  one  sample, 
that  for  the  Radar  Operator  School.  Significant  differences  in  errors 
of  estimate  occurred  in  the  second  Clerk-Typist  and  in  the  Photo  Labora- 
tory Technician  samples-  A significant  difference  in  slopes  was  found 
in  the  Supply  Technician  sample.  Significant  differences  in  intercepts 
occurred  in  the  remaining  samples.  All  signif leant  differences  in  inter- 
cepts favored  the  WAF  (see  Table  9)> 

The  difference  was  most  striking  in  the  case  of  the  Radio  Mechanic 
School.  In  terms  of  a five-point  grade  scale  with  2. 5 the  passing  grads 
and  5.0  the  maximum  grade,  the  grade  predicted  from  an  aptitude  index  of 
five  was  .42  grade  points  higher  for  WAF  than  for  white  males.  The 
difference  vas  .30  grade  points  in  the  Radio  Operator  School,  .32  grade 
points  in  the  first  Clerk-Typist  School,  and  .24  grade  points  in  the 
second  Teletype  Operator  School.  Even  larger  differences  in  favor  of 
WAF  personnel  occurred  in  the  grades  predicted  from  an  aptitude  index  of 
nine.  The  Photo  Laboratory  Technician  School  was  the  only  one  in  which 
the  difference  was  slightly  in  favor  of  white  males . 

At  the  .05  level  of  significance  the  slope  of  the  regression  line 
for  the  WAF  group  differed  from  that  for  the  white  male  group  in  three 
samples.  The  difference  in  slopes  is  apparent  in  the  differences  in 
predicted  grades  for  aptitude  indexes  of  five  and  nine  presented  in  Table 
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Table  9 


Differences 

Aptitude 


In  Final  School  Grades  Predicted  from  Minimum  and  Maximum 
Indexes  of  the  White  WAF  Group  from  the  White  Male  Group 


Adapted  from  Gordon  (78,  p.  6) 


Difference  between  WAF  and 
Technical  School  Aptitude  Index  vhite  male  predicted  final 

. school  grade  in  grade  units  * 


Clerk -Typist 

5 

.32 

9 

.56 

Supply  Technician 

5 

.14 

■ 

r\ 

y 

■»A 

• JC- 

Teletype  Operator 

5 

.11 

9 

.22 

Radio  Operator 

5 

•30 

9 

• 39 

Radar  Operator 

5 

.09 

9 

.09 

Radio  Mechanic 

5 

.42 

9 

M 

Weather  Observer 

5 

.05 

9 

.40 

Clerk-^fypist 

5 

.24 

(Second  Sample) 

9 

.38 

Teletype  Operator 

5 

.24 

(Second  sample) 

9 

.21 

Photo  Laboratory 

5 

-.05 

Technician 

9 

-.07 

*A  positive  figure  indicates  that  the  predicted  grade  for  WAF  was 
higher  than  that  predicted  for  white  male  airmen,  A negative  figure 
indicates  that  it  was  lower. 


I 


9.  The  respective  differences  are  .32  and  .56  for  the  first  Clerk - 
Typict  sample,  and  .?2  for  Supply  Technician,  end  .05  and  .kO  for 

Weather  Observer.  In  each  case  the  intercept  (at  an  arbitrary  stanine  of 
5)  is  higher  for  the  WAP  group. 

Gordon  pointed  out  that  further  examination  of  the  data  showed 
there  was  no  combination  of  predictors  for  these  or  other  samples  that 
would  equalise  the  group  differences  in  regressions.  When  the  regression 
of  final  grades  on  Biographical  Inventory  Scores  was  similarly  studied, 
a significant  difference  in  intercepts  was  found  for  all  samples. 

Gordon  concluded  by  recommending  that,  when  a minimum  aptitude 
index  of  five  is  required  of  white  males,  the  following  minimum  indexes 
should  be  used  for  WAP  : 

(1)  An  aptitude  index  of  four  for  schools  in  the  Clerical  Cluster. 

(2)  An  aptitude  index  of  four  for  schools  in  the  Radio  Operator 
Cluster. 

(3)  An  aptitude  index  of  five  for  schools  in  the  Technician 
Specialty  Cluster. 

(h)  An  aptitude  index  of  two  for  schools  in  the  Mechanical 

Cluster.  As  results  were  based  on  only  one  school,  Gordon 
offered  this  recommendation  only  tentatively. 

Gordon  added  in  a footnote,  however,  that  "If  different  minimum  aptitude 
indexes  are  to  be  recommended,  it  is  important  to  determine  regions  of 
significant  differences  between  sample  regression  lines  which  are  non- 
parallel" (T8>  P*8). 

Several  factors  considered  by  Gordon  in  her  attempt  to  explain 
why  WAP  personnel  earned  higher  course  grades  than  white  males  having  the 
sane  aptitude  index  were  the  following:  (l)  possible  cultural  bias  01 
tests  in  the  Airman  Classification  Battery;  (2)  greater  motivation  on 
the  part  of  the  WAP;  (3)  bias  in  the  final  school  grade  in  favor  of  WAP 
because  of  neatness,  superior  spelling,  a greater  ability  to  express 
themselves;  or  (4)  a general  tendency  of  women  to  overachieve  in  school. 
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The  work  of  Gordon  and  that  of  Howard  and  Pickrel  have  provided 
the  most  pertinent  and  hence  valuable  material  in  terms  of  the  problems 
faced  by  the  Navy.  The  studies  are,  in  fact,  the  only  studies  which  give 
some  clear  indication  of  the  relative  performance  of  men  and  women  on 
Navy-like  tests,  and  on  Navy-like  Jobs.  Here,  then,  is  a strong  sugges- 
tion that  predictions  from  scores  on  the  Navy  Basic  Test  Battery  may 
differ  in  meaning,  when  women  are  measured,  from  predictions  when  men  are 
measured,  although  with  some  adjustment  in  minimum  cutting  points  the 
scores  may  yet  be  extremely  useful. 

The  final  test  of  the  advisability  of  lowering  the  minimum  quali- 
fying scores  for  technical  schools  in  the  case  of  women  must  rest,  of 
course,  upon  information  derived  from  measurement  of  on-the-job  profi- 
ciency. Such  information  is  not  as  yet  available  in  the  literature, 
although  in  this  connection  Wickham  (191,  P-  169)  noted  a parallelism 
in  the  British  studies  between  training  validities  and  on-the-job 
validities.  "The  results  found  for  follow-up  in  the  working  units  prac- 
tically duplicated  those  found  for  training." 


V.  Civilian  Studies  of  the  Selection  of  Women  for  Navy-like 


Jobs  Using  Tests  like  Those  in  the  Navy  Basic  Test  Batter 


A.  Clerical  Workers 

The  Navy  utilizes  a considerable  number  of  enlisted  vomen  In 
clerical  positions.  Reports  on  the  selection  of  women  for  clerical 
work  form  the  largest  single  group  of  research  studies  in  the  litera- 
ture which  may  be  considered  relevant  to  this  survey.  Tests  similar  to 
some  of  the  Navy's  tests,  such  as  the  Minnesota  Clerical  Test,  frequently 
have  been  used.  Unfortunately,  the  literature  afforded  little  by  way  of 
comparisons  be tvs an  the  se] action  of  male  clerical  workers  and  the  selec- 
tion of  female  clerical  workers.  Most  clerical  workers  are  women,  and 
most  studies  of  selection  of  clerical  work  have  been  based  on  women. 
Occasionally  men  formed  part  of  a group  that  was  studied,  but  usually  no 
break-down  of  the  results  by  sex  was  attempted . The  policy  has  seemingly 
been, rather,  to  insure  greater  experimental  control  by  omitting  data  from 
one  sex  or  the  other,  especially  in  cases  similar  to  that  reported  by  one 
author  (103)  in  which  significant  sex  differences  in  performance  were  noted. 

The  discussion,  therefore,  will  be  limited  mainly  to  the  selec- 
tion of  women  for  clerical  work.  In  general,  tests  found  to  be  reasonably 
successful  in  selecting  clerical  workers  were  of  three  types:  tests  of 
general  intelligence,  tests  of  clerical  aptitude,  and  measures  of  perform- 
ance on  work  samples  which  involved  typing,  filing,  etc.  In  several 
cases  a battery  combining  tests  of  two  or  more  of  these  types  was  used. 

It  should  perhaps  be  added  that  in  none  of  the  multiple  correlation 
studies  on  the  "election  of  clerical  workers  reported  here  was  there  evi- 
dence of  crocs -validation  nor  was  evidence  cited  to  demonstrate  that  cut- 
ting scores  derived  from  one  sample  would  serve  as  successfully  to  select 
good  workers  in  another  sample. 

1.  The  Use  of  Test  Batteries  in  Selecting  Clerical  Workers.  One 
of  the  earliest  studies  reporting  the  use  of  a battery  in  selecting 
clerical  workers  was  conducted  in  1921  by  Bills  (24,  25).  She  concluded 
from  her  study  of  139  applicants  for  courses  in  stenography  and  comp- 
tometer operation  that  a battery  of  tests  was  more  effective  than  any 


single  teat  in  picking  successes  and  eliminating  failures.  Her  battery 
included  a test  of  general  intelligence  adapted  from  the  Army  Alpha, 
and  a test  of  aptitude  for  typing  and  stenography.  Of  single  tests,  she 
concluded  that  a test  of  general  intelligence  vas  most  effective  for 
eliminating  persons  likely  to  fail  and  that  a test  of  special  ability, 
e.g.,  stenography  and  typewriting,  was  most  effective  in  picking  successes. 

For  selecting  bank  machine  bookkeepers.  Hay  (88)  reported  the 
successful  use  of  two  batteries  similar  to  Bills'  in  that  they  utilized 
clerical  aptitude  and  general  Intelligence  measures  as  selectors.  One 
battery  consisted  of  Alpha  Number  Series,  Minnesota  Numbers,  and  Fryer 
Name  Finding,  and  the  other  battery  contained  the  Otis  Self -Administering 
Test  of  Mental  Ability  (Form  B),  Minnesota  Names,  and  Minnesota  Numbers. 

Using  the  Vherry-Doolittle  method  for  selecting  a test  battery. 
Holmes  (96)  found  that  a combination  of  the  Wonderlic  Personnel  Test  with 
a typing  test  yielded  a multiple  correlation  of  A8  with  criterion  ratings 
for  88  secretaries;  a combination  of  a cancellation  ..est  from  the  State 
Farm  Personnel  Survey  and  a typing  teBt  yielded  a multiple  correlation 
of  .V*  with  criterion  ratings  ror  po  typists;  and  a combination  of  the 
Wonderlic  Personnel  Test  with  a general  Clerical  Ability  Test  from  the 
State  Farm  Personnel  Survey  yielded  a multiple  correlation  of  .52  with 
criterion  ratings  for  107  clerk-typiBtB. 

Hay  (89)  reported  a multiple  correlation  of  .38  between  scores  on 
a combination  of  the  Minnesota  Clerical  Test,  a number  series  completion 
test,  a name  finding  test,  and  supervisor's  ratings  for  a group  of  82 
key-punch  operators.  In  a later  article  (90),  Hay  reported  the  predictive 
efficiency  for  a battery  of  five  testB  applied  to  82  key -punch  operators. 
The  tests  used  were:  Wonderlic  Personnel  Test,  Minnesota  Clerical 
(Numbers  and  Names),  Hay  Number  Series  A,  and  Hay  Name  Finding.  Five 
different  combinations  of  cutting  Bcores  were  successful  to  the  extent 
of  selecting  a group  of  which  85  per  cent  or  more  of  the  workers  were 
rated  "good"  by  supervisors.  The  proportion  of  workers  rated  "good"  in 
the  group  falling  below  the  cutting  score  varied  from  15  per  cent  to 
per  cent  in  the  five  examples,  and  in  two  of  the  five  cases  more  than  50 
per  cent  of  the  workers  rated  "good"  fell  below  the  cutting  score.  No 
cross  validation  was  reported. 

2.  Peported  Validities  for  Single  Tests  Used  in  Selecting  Clerical 
Workers . Reports  of  the  relations  between  various  criteria  of  job  suc- 
cess for  clerical  workers  and  scores  on  tests,  taken  singly,  ars  available 
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from  some  of  the  studies  referred  to  above.  Tests  which  are  similar  to 
the  Navy's  tests  have  been  emphasized  in  the  discussion  below. 

Validity  coefficients  for  the  Minnesota  Vocational  Test  for  Cleri- 
cal Workers  have  been  reported  as  follows : .30  with  supervisors ' ratings 

for  82  key -punch  operators  by  Hay  (89);  Minnesota  Numbers  .62  and  Minne- 
sota Names  .54  with  measures  of  production  for  27  typiests  by  Blakemore 
(28).  Although  he  provided  no  validity  coefficients,  Barrett  (ll)  indi- 
cated that  the  Minnesota  test  differentiated  between  good  and  poor  students 
in  a typewriting  course.  For  a group  of  39  machine  bookkeepers  Hay  (88) 
reported  a correlation  with  production  of  .51  for  Minnesota  Numbers,  and 
of  .47  for  Minnesota  Names.  In  a study  of  120  bank  clerks  and  typists 
Seashore  (158)  reported  validity  coefficients  for  Minnesota  Numbers  of 
.29  with  speed,  .36  with  accuracy,  and  .30  with  "ability  to  grasp  new 
ideas."  Minnesota  Names  correlated  .38,  .39,  and  .45  with  these  criteria, 
ar.d  the  total  score  gave  corresponding  coefficients  of  .38,  .42,  and  .43. 

In  one  study  Hay  (89)  reported  the  following  correlations  with 
supervisory  ratings:  .25  for  Number  Series  Completion,  .26  for  Name 
Finding,  and  .30  for  the  Minnesota  Clerical.  The  subjects  were  a group 
of  82  key -punch  operators,  presumably  female. 

In  studying  27  female  typists  for  whom  the  criterion  was  a measure 
of  production,  Blakemore  (28)  found  that  the  Hay  Number  Perception  Test 
and  the  Minnesota  Clerical  Test  (Numbers)  both  had  validity  coefficients 
of  .62.  However,  scores  from  the  Hay  Number  Series  test  yielded  no  sig- 
nificant correlation  in  the  same  situation. 

Holmes  (95*  96)  found  that  the  general  clerical  sub-test  of  the 
State  Farm  Personnel  Survey  had  the  following  validities  against  super- 
visors' ratings:  .00  for  a group  of  88  secretaries,  -.15  for  a group  of 
56  typists,  .42  for  a group  of  107  clerk -typists,  .52  for  a group  cf  100 
supervisory  personnel,  and  .49  for  5®  interpretive  personnel.  Corres- 
ponding validities  for  the  general  information  sub-test  were  .25,  -.03, 

.15,  .43,  and  .16  respectively.  The  cancellation  test,  from  the  same 
battery  yielded  correlations  of  .08  for  88  secretaries,  .33  for  56 
typists,  .11  for  107  clerk-typists,  .62  for  100  supervisory  personnel, 

.23  for  50  interpretive  personnel,  and  .42  for  a group  of  50  key-punch 
and  verifier  operators. 
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In  a study  of  office  personnel,  Giese  (74)  reported  the  follow- 
ing correlations  between  supervisors'  ratings  and  various  scores  on  the 
General  Clerical  Test:  for  the  total  score,  .4l  (N  ■ 26);  the  clerical 
subscore,  ,3k  (N  ■ 38);  the  numerical  subscore,  .b2  (N  «4l);  the  verbal 
subscore,  .48  (N  ~ 2 6).  A study  o?  clerical  workers  cited  in  the  manual 
of  the  General  Clerical  Test  (l44)  reported  correlations  with  perform- 
ance ratings  of  .43  (N  ■ 73)  for  the  total  score,  .45  (N  — 68)  for  the 
clerical  subscore,  .22  (N  — 7l)  for  the  numerical  subscore,  and  .42 
(N  « 70)  for  the  verbal  subscore. 

Seashore  (158)  also  reported  validity  coefficients  for  the 
General  Clerical  Test.  For  120  clerks  and  typists  these  ranged  from  .56 
when  the  criterion  was  ability  to  grasp  new  ideas  as  rated  by  super- 
visors, through  .48  with  ratings  of  speed,  to  .46  with  ratings  of 
accuracy.  A biserial  coefficient  of  .32  with  grades  in  a training 
course  was  observed  for  a group  of  ll6  clerical  workers.  In  the  latter 
study  scores  on  the  3RA  Clerical  Test  gave  a biserial  correlation  of  .36 
with  letter  grade  in  the  training  course. 

The  use  of  the  American  edition  of  the  NIIP  Clerical  Test  in  the 
selection  of  students  for  college  library  work  was  reported  by  Oberheim 
(134,  135).  Scores  on  the  clerical  test  correlated  significantly  with 
the  criterion  ratings  for  men  and  women  combined.  In  one  sample  the 
correlations  were  higher  for  men,  and  insignificant  for  women. 

Although  Oberheim  (134)  found  that  scores  on  the  American  Council 
on  Education  Psychological  Examination  correlated  .42  w:.th  success  in 
library  work  for  men  and  women  combined,  she  was  reluctant  to  draw  any 
conclusions  about  the  relative  value  of  the  clerical  test  and  the  ACEPE 
in  eelection  of  the  women  because  all  correlations  for  the  women  were  low. 
However,  she  continued  to  use  a combination  of  the  ACEFE,  NIIP  Clerical, 
and  college  grades  as  a predictor. 

Holmes  (95)  found  a correlation  of  .47  between  scores  on  the  Otis 
Self -Administering  Test  of  Mental  Ability  and  criterion  ratings  for  a 
"skilled"  group  of  office  personnel.  Hay  (88)  reported  correlations  of 
.56  between  scores  ou  the  Otic  S.A  (20'version)  and  production  measures 
for  39  machine  bookkeepers . In  the  previously  mentioned  report  by 
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Seashore  (158)  of  a study  of  120  bank  clerks  and  typists  for  whom  the 
criteria  were  ratings  made  by  supervisors,  Otis  scores  correlated  .17 
with  speed,  .16  with  accuracy,  and  .38  with  ability  to  grasp  new  ideaB. 

Holmes  (95,96)  reported  the  following  correlations  between  scores 
on  the  Wonderlic  Personnel  Test  and  criterion  ratings:  .33  for  secre- 
taries, .22  for  typists,  .36  for  clerk  typists,  .51  and  A 9 for  super- 
visory office  personnel,  and  .56  for  an  "interpretive"  group  of  office 
workers.  Hay  (90)  reported  a significant  difference  betveen  mean  scores 
on  the  Wonderlic  for  53  key-punch  operators  rated  "good"  and  29  rated 
"poor."  For  27  typists,  Blakemore  (28)  found  a validity  coefficient  of 
.32  on  the  Wonderlic  test  when  a measure  of  oroduction  waB  the  criterion. 

In  studying  ll6  clerical  trainees,  Seashore  (158)  reported  a biserial 
correlation  of  .35  between  scores  on  the  Wonderlic  and  letter  gradeB  in 
a training  course . 

Tiffin  and  Lawshe  ( 176 ) reported  higher  scores  on  an  Adapta- 
bility Test  for  50  good  clerical  workers  as  opposed  to  38  poorer  workers. 
Rogers  (1U9)  used  four  subtests  of  the  Woodworth-Wells  Series,  verb- 
object,  number-checking,  color-naming,  and  action-agent,  in  studying 
three  groups.  For  one  group  of  77  BtudentB  the  criterion  was  mid-year 
grades  in  stenography,  grammar,  and  typewriting;  for  the  other  two  groups, 
consisting  of  clerical  workers  (N  — 58  and  65),  the  criterion  was  a measure 
of  production.  Rogers  found  that  these  BubteBtB  correlated  more  highly 
with  course  grades  than  with  measures  of  production,  the  coefficients 
ranging  from  .39  to  A6  when  course  gradeB  were  the  criterion,  and  from 
.13  to  .39  when  a measure  of  production  was  the  criterion. 

In  a study  of  the  effectiveness  of  grades  on  the  Civil  Service 
Examination  for  discriminating  between  various  levels  of  card-punching 
efficiency,  Marcus  (125)  found  that  a team  of  five  subtests  of  the  Wood- 
worth-Wells  SerieB  correlated  A5  with  efficiency  records  based  on  speed 
and  accuracy. 

A general  intelligence  test  adapted  from  the  Army  Alpha  was 
■found  by  Bills  (25)  to  make  correct  predictions  for  85  per  cent  of  the 
applicants  in  courses  in  stenography  and  comptometer  operation:  those 
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who  both  failed  on  the  test  and  failed  on  the  course  together  with  those 
who  both  passed  the  test  and  passed  the  course  made  up  85  per  cent  of 
the  sample  studied.  Further  computation  by  one  of  the  present  writers 
yielded  a contingency  coefficient  for  Bills'  data  in  the  neighborhood  of 
.5 6 for  a group  of  67. subjects. 

Several  correlation  coefficients  between  scores  on  subtests  of 
the  Army  Alpha  and  production  measures  for  39  machine  bookkeepers  were 
reported  by  Hay  (88)  as  follows:  total  score  .51;  number  series  .5 6; 
same-opposite  .1+7;  verbal  .1+7;  numerical  .1+1+ ; relationships  .1+3;  euialogieB 
.1+2;  information  .1+0;  sentences  .1+0;  arithmetic  .37>  directions  .32. 

Kinney  (103)  reported  Mserial  correlations  of  .63  between  scores 
in  addition  and  ratings  for  77  female  mail-order  house  clerical  workers 
and  .21  between  scores  in  addition  and  ratings  for  5^  wholesale  office 
workers.  Seashore  (158)  reported  correlations  of  .28,  .25,  and  .39 
between  Alpha  Number  Series  and  ratings  of  speed,  accuracy,  and  ability 
to  grasp  new  ideas  made  by  supervisors  of  120  clerks  and  typists  in  a 
bank. 

In  spite  of  the  large  number  of  studies  on  the  selection  of 
clerical  workers,  the  implications  of  the  findings  are  by  no  means  clear 
or  unequivocal.  No  attempt  has  been  made  by  the  authors  to  relate  the 
choice' of  criteria  to  the  success  of  the  tests.  Criteria  vary  widely, 
from  quite  specific  items  such  as  measures  of  speed,  accuracy,  or  a 
combination  of  both,  to  global  ratings  of  the  employee's  general  worth 
to  the  company.  Nor  has  the  influence  of  the  level  of  responsibility 
held  by  the  employee,  undoubtedly  an  important  variable,  been  related 
systematically  to  the  selection  measure  except  in  a few  cases  such  as 
that  reported  by  Hay  ( QQ ) . 

The  reasonable  degree  of  effectiveness  of  such  tests  as  the  Minne- 
sota Clerical  Test,  the  General  Clerical  Test,  the  SEA  Clerical  Test,  and  the 
NIIP  Clerical  Test  has  been  amply  demonstrated.  Correlations  with  the 
criteria  reported  by  authors  quoted  in  this  review  tend  to  group  around 
.1+2.  Intelligence  tests,  such  as  the  Otis  SA,  the  Wonderlic  Personnel 
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Test,  the  Voodworth-We 11s  Series,  and  selected  subtests  of  the  Army 
Alpha  reported  in  this  reviev  yielded  correlations  with  the  various 
criteria  averaging  in  the  neighborhood  of  .40. 

By  surveying  the  literature  and  grouping  reported  validities 
according  to  job  and  test,  Ghiselli  and  Brown  (72)  presented  a more 
clearly  organized  picture  of  the  status  of  tests  in  predicting  the  train- 
ability  and  proficiency  of  clerical  workers  than  had  hitherto  been  avail- 
able. They  compiled  considerable  information  about  three  types  of 
clerical  workers:  general  clerks,  recording  clerks,  and  computing  clerks. 
While  no  reference  was  made  to  the  sex  of  the  subjects  in  their  article, 
some  of  the  conclusions  drawn  by  Ghiselli  and  Brown  with  regard  to 
clerical  workers  might  be  considered  relevant  to  this  review,  for  it  is 
likely  that  the  samples  were  made  up  largely  of  women. 

On  the  basis  of  Ghiselli  and  Brown’s  analysis  of  available  data, 
the  tests  having  the  consistently  highest  validities  for  all  three  types 
of  clerical  workers  appeared  to  be  {Arithmetic  and  number  comparison  tests. 
The  mean  of  the  reported  validity  coefficients  for  an  arithmetic  test  was 
.43  for  general  clerks,  .1+1  for  recording  clerks,  and  .35  for  computing 
clerks.  The  mean  reported  validities  for  number  comparison  for  these 
groups  were  .1+2,  .29,  and  .33  respectively.  For  general  clerks  alone, 
intelligence,  arithmetic,  number  comparison,  and  name  comparison  testa 
were  the  best  tests  and  were  about  equal  in  validity.  The  mean  of  the 
reported  validities  for  intelligence  tests  with  thia  group  was  .1+2;  for 
arithmetic,  .1+3;  for  number  comparison,  .42;  and  for  name  comparison, .40. 
(The  mean  validity  coefficients  reported  by  Ghiselli  and  Brown  were 
weighted  means  computed  through  Fisher's  Z'  transformation.) 

Although  Ghiselli  and  Brown  were  able  to  compile  this  information 
about  three  types  of  clerical  work,  they  did  net  (or  were  unable  to) 
consider  the  reports  in  the  literature  in  terms  of  choice  of  criteria, 
level  of  Job,  or  degree  of  responsibility  held  by  the  worker.  It  is 
1X508 ible  that  seme  information  regarding  the  differential  value  of  the 
test  is  lost  when  cuch  an  analysis  is  not  carried  out.  For  general 
clerks,  for  instance,  intelligence  tests  and  number  and  name  comparison 


tests  are  about  equal  In  validity.  However,  if  either  of  these  tests  is 
used  alone,  it  is  more  than  likely  that  it  does  not  select  the  same  sort 
of  clerical  worker  as  does  the  other  test.  It  is  quite  possible  that  v&h 
proper  controls  of  these  factors  the  clerical  test  might  be  superior  in 
selecting  for  lower- level  positions,  and  that  as  the  responsibility  of 
the  clerical  position  becomes  greater,  the  value  of  an  intelligence 
moasure  would  likewise  increase.  There  is,  however,  little  available 
information  from  civilian  research  which  clearly  demonstrates  this  to  be 
so,  although  it  is  possible  that  many  selection  procedures  are  based  on 
an  lap  lie it  acceptance  of  such  an  hypothesis.  However,  none  of  the 
studies  reported  here  explicitly  differentiated  between  the  levels  of 
clerical  job  being  studied  nor  attempted  to  trace  any  variation  in  the 
relationship  between  success  on  the  Job  and  scores  on  the  tests  through 
the  various  levels  of  clerical  work.  It  is  impossible  to  conduct  any 
further  analysis  of  the  reported  results  as  no  descriptions  of  the  com- 
plexity or  responsibility  of  the  work  were  provided. 

B . Industrial  Workers 

Typically  studies  in  this  area  have  involved  males  as  subjects. 

In  the  few  instances  where  women  have  been  studied,  the  Jobs  have  tended 
„o  involve  tasks  such  as  sewing  machine  operation  or  inspection-packing, 
for  which  there  are  no  Navy  counterparts  (Ghiselli  ( 70 ),  Maher  and  Fife 
(124),  Blum  and  Candee  (31,  32),  and  Grauer  (79)  )•  To  the  extent  that 
published  research  adequately  reflects  the  type  of  employment  found  by 
women  in  industry,  one  is  led  to  the  conclusion  that  industrial  Jobs 
tend  to  be  held  by  one  sex  or  the  other  with  little  intermingling  of 
the  sexes  on  one  job  within  one  industry  or  plant. 

As  a result  there  is  almost  no  reported  civilian  research  which 
provides  a basis  for  comparisons  of  the  sexes  in  selection  for  the  saa*e 
Job  by  the  same  measure.  When  the  search  is  limited  to  selection  for 
Jobs  with  tests  similar  to  those  used  in  the  Navy  the  field  is  narrowed 
even  more  drastically.  One  small  study  by  Forlano  and  Xirkpatrick  (64) 
of  20  female  radio  tube  mounters  reports  the  use  of  the  Otis  SA  along 


vlth  tvo  measures  of  personality.  A study  of  33  women  trainee "telephone 
mechanics  reported  by  Oxlade  (l4o)  refers  to  the  use  of  the  Otis,  the 
Progressive  Matrices,  and  the  ACER  Mechanical  Comprehension  tests. 

Since  the  criterion  In  the  latter  study  was  a highly  theoretical 
examination,  and  the  size  of  the  sample  In  each  of  the  studies  vas  small. 
It  Is,  therefore,  difficult  to  appraise  the  significance  of  the  results. 

A study  by  Bolanovlch  (33)  presented  an  analysis  of  data  from  the 
records  of  66  Radio  Corporation  of  America  Engineering  Cadettes  (female) 
who  attended  a ten  months ' electronics  engineering  course  at  Purdue  Uni- 
versity preparatory  to  entering  Jobs  as  engineering  aides  in  six  manu- 
facturing plants  of  the  company.  Grade-point  averages  shoved  significant 
correlations  vlth  the  Cooperative  General  Mathematics  Vest  for  high 
school  students  (r  .55),  the  Wonderlic  Personnel  Test  (r  .50),  pre- 
vious school  grades  (r  *■  .50),  a fitness  rating  (r  ■*  .38),  and  a person- 
ality rating  (r  ■■  .32).  A shrunken  multiple  correlation  of  .6l 

vas  found  between  grade-point  averages  and  a combination  of  score  on  the 
ACE  mathematics  test  and  previous  school  grades.  Other  tests  failed  to 
raise  this  correlation. 
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VI.  Civilian  Studies  of  the  Selection  of  Woman  for  Havy-llke 
Jobs  Using  Teats  Unlike  Those  In  the  K&vy  Basic  Test  Battery 

A.  Clerical  Workers 

Tests  of  typing  ability  are  frequently  uaed  In  selecting  clerical 
workers.  Olese  (74)  reported  a correlation  of  .64  betveen  scores  on  the 
Kimberlpy-Clark  Typing  Ability  Analysis  Test  and  ratings  by  supervisors 
for  24  clerical  employees . Barrett  (ll)  found  that  total  scores  from 
the  Turse  Shorthand  Aptitude  Test  differentiated  between  "good"  and 
"poor"  typing  students  (N  ■>  96)  as  measured  by  grades;  transcription  and 
phonetic  association  subscores  from  the  same  test  were  stated  to  differ- 
entiate between  "good"  and  "average"  stenography  students  (N  — 75). 

Holmes  (96)  reported  validity  coefficients  for  testa  of  typing  and  short- 
hand against  the  criterion  of  pooled  supervisors'  ratings.  These  were 
.36  and  .28  respectively  for  88  secretaries;  .35  and  .29  for  56  typists; 
and  .14  and  -.18  for  107  clerk  typists. 

Several  studies  have  attempted  to  use  personality  measures  as 
predictors  of  success  In  clerical  work,  e.g.,  Dodge  (56,  57)  and  HcMurry 
(123),  but  are  not  considered  relevant  to  this  discussion  and  hence  will 
not  be  discussed  in  detail. 

B.  Industrial  Workers 

The  selection  of  women  for  the  Industrial  jobs  which  they  most 
commonly  hold,  such  as  insoeet'lng,  packing,  assembling,  and  operating 
power  sewing  machines,  tends  to  Involve  measures  of  special  ability 
quite  unlike  the  tests  used  by  the  Navy.  Many  manual  dexterity  tests, 
such  as  the  Minnesota  Placing  and  Turning  Test  and  the  tests  designed  by 
MacQuarrle  and  0 'Connor,  have  been  used  with  a moderate  degree  of  success. 
Reported  validity  coefficients  range  from  .34  to  .62  and  appear  to  cluster 
around  .54.  Cross  validation,  however,  is  on  the  vhole  conspicuous  by 
Its  absence. 

The  studies  of  power  sewing  machine  operators  by  Treat (179),  Otis 
(138),  Ruch  (153)  and  Glanz  (75)  vill  mentioned  only  In  passing  as 
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thare  is  no  comparable  Navy  Job.  Similar  brief  mention  is  made  of  the 
several  studies  by  Walker  (189),  Ayers  (8),  Coleman  (47)  and  Kerr  (102) 
on  visual  factors  in  Job  success,  and  of  the  work  of  Bolanovich  (3*0 
on  the  influence  of  Interest  testing  in  reducing  factory  turnover. 

Tiffin  Greenly  (175)  found  that, in  two  of  three  groups 
studied,  the  0 ‘Connor  Finger  Dexterity  Test  picked  radio  and  electric 
fixture  assemblers  above  average  in  amount  and  quality  of  production  on 
the  assembly  line.  Deported  correlations  with  pooled  ratings  of  efficiency 
were  .33  for  a group  of  35  and  .20  for  a group  of  42  operators.  A hand 
precision  test  picked  operators  who  were  above  average,  particularly  in 
quality  of  work,  yielding  an  V of  .63  for  one  group  of  36  operators 
and  of  .24  for  another  group  of  35  operators.  For  a third  group  of  42 
radio  assemblers  the  same  test  was  useful  (r  - .23)  vhen  error  score 
alone  was  used.  Performance  on  vision  tests  varied  inversely  with 
production.  A best-weighted  combination  of  scores  from  the  finger  dex- 
terity test,  hand  precision  test,  visual  acuity,  and  color  vision  tests 
vas  reported  to  yield  a multiple  correlation  with  pooled  ratings  of 
efficiency  of  .60  for  the  third  group  of  42  radio  assemblers. 

Tiffin  and  Rogers  (177)  reported  findings  for  150  inspectors 
engaged  in  examining  the  quality  of  tin  plate.  For  this  sample,  visual 
discrimination,  height,  and  weight  were  as  important  as  manual  dexterity. 
Other  authors  have  reported  greater  success  with  manual  dexterity  tests. 
Rusmore  (154)  reported  validity  coefficients  of  .49  and  .60  with  ratings 
of  supervisors  in  Jobs  of  inspecting,  labelling,  and  packaging  based  on 
a sample  of  28  women.  Two  studies  of  the  relationship  between  scores  on 
the  MacQuarrie  test  and  ratings  of  329  radio  assemblers  were  conducted 
by  Goodman  (76,  77).  In  one  study  (77)  validities  were  not  reported  but 
were  considered  by  the  author  to  be  significant.  In  the  other  study  (76) 
Goodman  reported  a zero-order  correlation  of  .42  between  instructor’s 
ratings  and  total  scores  on  the  test  and  a multiple  correlation  of  .46 
with  scores  on  the  subtests  of  the  MacQuarrie. 

In  Surgent’s  (166)  study  of  233  radio-tubs  mounters  in  which  the 
criterion  was  the  pooled  ratings  of  the  superivisor  and  the  instructors. 
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validity  coefficients  were  .56  for  the  Minnesota  Placing  Test,  .50  for 
the  Minnesota  Turning  Test,  .48  for  the  O'Connor  Tweezer  Dexterity  Test, 
and  .64  for  the  Purdue  Pegboard  Assembly  Test.  Optimal  combination  of 
the  tests  produced  a multiple  B of  .16. 

Blum  (30)  reported  a significant  relationship  between  scores  on 
the  O'Connor  Finger  and  Tweezer  Dexterity  tests  and  success  on  the  Job 
for  applicants  accepted  in  a watch  factory.  As  a result  of  a similar 
study,  Candee  and  Blum  (44)  suggested  the  use  of  the  Tweezer  Dexterity 
-rest  in  the  initial  selection  of  workers  for  this  factor  and  the  use  of 
the  Finger  Dexterity  Test  in  selecting  superior  workers.  Similarly,  Hines 
and  O'Connor  (93)  reported  the  successful  use  of  the  O'Connor  Finger 
Dexterity  Test  in  selecting  women  for  Jobs  involving  fine  meter  or  instru- 
ment work  in  the  West  Iynn  plant  of  the  General  Electric  Company. 

Hayes  (92)  reported  significant  differences  in  scores  on  two  peg- 
boards  (one  of  which  was  the  O'Connor  pegboard)  between  workers  identi- 
fied as  "quick"  and  "slow"  learners.  His  findings  were  based  on  the 
studv  of  1541  women  engaged  a.  coil  winders,  drill  and  punch  press  opera- 
tors,  operators  of  insulating  machines,  and  bench  hands.  The  conclusions, 
however,  need  qualifying;  groups  were  dichotomized  differently  for  each 
of  his  eight  samples,  and  critical  ratios  were  employed  when  small  sample 
techniques  should  have  been  uned  in  view  of  the  size  of  the  sample. 

It  is  evident  that  tests  of  manual  dexterity  have  been  used  with 
considerable  success  in  selecting  women  for  a number  of  industrial 
positions.  There  is  no  evidence  available  to  indicate  whether  or  not 
such  tests  would  require  different  interpretation  in  predicting  the 
succeas  of  men  on  the  same  Jobs.  Furthermore,  there  is  no  indication 
reported  results  that  these  tests  would  be  an  important  addition  to  the 
tests  currently  used  in  the  Navy.  Further  study  would  be  necessary  in 
order  to  determine  the  usefulness  of  dexterity  tests  as  a supple  *0 
the  basic  battery  in  selection  for  Jobs  in  the  Navy. 
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VII.  Sex  Differences  in  Scores  on  Teste 
Like  Those  in  tha  Wavy  Basic  Test  Battery 

Norms  based  on  samples  of  both  men  and  vcmen  are  available  for 
several  teste  which  are  similar  to  the  Navy's  tests.  It  Is  important  to 
know  that  significant  sex  differences  have  been  noted  for  various  tests 
when  considering  them  for  use  In  predicting  future  Job  performance. 
Normative  data  alone  can  provide  no  solutions  to  problems  of  the  adequacy 
of  prediction  but  may,  at  least,  point  to  areas  in  which  further  Btudy  Is 
needed.  Whenever  significant  sex  differences  are  observed,  questions  Buch 
as  the  following  Immediately  become  Important:  Is  the  test  providing  a 
measure  of  the  same  skills  for  women  that  It  does  for  men?  Have  the 
scores  the  same  significance  for  women  that  they  have  for  men  — will  a 
woman  with  a given  score  on  the  test  perform  at  the  same  level  on  the 
Job  as  will  a man  with  the  same  test  score? 

Some  data  on  sex  differences  in  performance  on  clerical  tests  and 
on  tests  of  mechanical  comprehension  have  therefore  been  considered  impor- 
tant because  they  may  serve  to  raise  these  and  other  questions.  Eeports 
citing  such  findings  are  Included  below.  It  le  not,  of  course,  to  be 
Implied  that  absence  of  any  report  of  Bex  differences  on  a test  indicates 
that  the  test  is  successfully  measuring  the  same  ability  or  skill  to  the 
same  extern:  find  in  the  same  way  for  both  men  and  women. 

A.  Clerical  Tests 

Of  the  many  tests  which  purport  to  measure  clerical  aptitude,  few 
are  provided  with  separate  norms  for  men  and  women.  Separate  normative 
data  based  on  adequate  samples  of  both  sexes  are  available  for  only  two 
tests,  the  Minnesota  Clerical  Test  and  the  Psychological  Corporation's 
General  Clerical  Test. 

The  norms  which  accompanied  the  publication  In  1955  of  the  Minne- 
sota Clerical  Test  (4)  shoved  that  for  samples  drawn  from  the  general 
population  women  were  decidedly  superior  to  men  on  this  test.  The  super- 
iority persisted  In  studies  of  clerical  workers  although  It  was  less 
marked.  In  the  manual  accompanying  the  19^6  revision  (5)  the  authors 


reported  that  studies  carried  out  in  the  interim  period  had  confirmed  the 
original  findings.  Only  1 6 per  cent  of  men  reached  or  exceeded  the 
median  for  vomen  in  samples  presumed  to  represent  the  general  population, 
and  only  21  per  cent  of  the  men  reached  or  exceeded  the  median  for  women 
in  samples  from  clerical  populations.  The  authors  noted,  however,  that 
these  differences  tended  to  disappear  when  comparisons  were  limited  to  a 
given  type  or  level  of  Clerical  Job.  The  data  presented  in  the  manual 
were  based  in  part  upon  research  carried  out  by  Loevinger  (ll6),  Thatcher 
(171),  and  Schneidler  and  Paterson  (155).  These  studies  traced  the  sex 
differences  back  to  the  fifth  grade  level.  Engelhardt  (62)  hae  noted 
similar  differences  for  college  groups. 

A similar  pattern  of  sex  differences  is  observed  in  the  published 
norms  for  the  Psychological  Corporation's  Clerical  Test  (144).  Women 
are  superior  on  the  average  to  men  on  the  verbal  and  clerical  9ubscoree 
and  on  the  total  score;  the  differences,  however,  are  not  large.  Men 
tend  to  score  higher  on  the  numerical  subtests  (for  which  there  1b  no 
counterpart  in  the  Minnesota  Clerical  Test).  The  norms  for  the  Australian 
Council  for  Educational  Research  Speed  and  Accuracy  Test  (6)  show  women 
to  be  superior  at  every  age  level  for  which  data  was  available. 

A review  of  manual  for  other  clerical  tests  reveals  that  norms 
are  frequently  based  entirely  upon  the  testing  of  female  samples.  This 
is  true  for  the  Bennett  Stenographic  Aptitude  Test  (15)  and  the  Short 
Employment  Tests  (20).  Other  test  manuals  present  one  set  of  norms  for 
both  sexes  with  no  reference  to  the  sex  of  the  original  sample.  This  is 
the  case  for  the  EEC  Stenographic  Aptitude  Test  (55) > the  Acorn  Clerical 
Aptitude  Test  (104),  the  Chicago  Test  of  Clerical  Promise  (132),  and  the 
NIIP  Clerical  Test:  American  Revision  (131). 

Pew  of  the  authors  discussed  the  possible  origins  of  the  reported 
sex  differences.  Hypotheses  have  been  offered  with  extreme  caution;  those, 
like  Schneidler  and  Paterson  ( 155) > vho  have  ventured  to  speculate  at  all 
have  demonstrated  an  interest  in  the  possible  influences  of  differences 
in  training  and  in  encouragement  to  achieve  in  these  areas. 


B.  Teats  of  Mechanical  Comp re hens Ion 


The  most  striking  reports  of  sox  differences  vere  those  for 
tests  which  were  desigred  to  measure  mechanical  aptitude.  Bennett  and 
Crulkshank  (17)  in  a study  of  390  girls  and  338  boys  of  comparable  age 
and  education  found  that  the  boys  scored  significantly  higher  on  the  Test 
of  Mechanical  Comprehension  through  grades  ten,  eleven,  twelve,  and  into 
the  first  year  of  college.  As  a result  of  an  item  analysis  these  authors 
found  that  certain  items  discriminated  between  the  sexes  to  a far  greater 
extent  than  others.  This  discovery  probably  led  to  the  publication  in 
1951  of  form  W1  of  the  Bennett  test  (19),  consisting  of  items  which 
supposedly  do  not  discriminate  against  women. 

In  publishing  norms  for  a test  somewhat  similar  to  the  Bennett, 
the  Australian  Council  for  Educational  Besearch  (7)  reported  similar 
findings.  In  samples  of  2,000  adult  males  and  1,000  adult  females 
chosen  to  represent  the  general  population,  and  for  university  samples 
of  .20  sales  and  295  females,  males  scored  higher  than  females.  This 
sex  difference  was  not  found  in  studies  using  the  Detroit  Mechanical 
Aptitudes  Examination  (9).  The  last  test,  however,  contains  arithmetic, 
motor  ability,  and  assembly  subtests,  as  well  as  a subtest  of  mechanical 
knowledge ; it  is,  therefore,  hardly  comparable  with  the  Bennett  type  of 
test. 

Trederiksen  {67)  reported  a difference  in  insen  scor6  of  3*95 
points  between  a sample  of  ^857  men  and  13^0  women  on  the  Mechanical  Com- 
prehension Test  from  the  U.  S.  Navy 'b  Officer  Qualification  Test.  The 
men  displayed  a slightly  greater  variability  in  seers  with  a standard 
deviation  of  3.63  as  opposed  to  2.93  for  the  women. 

McElheny  (120)  related  scores  on  Form  AAt of  the  Bennett  test  to 
scores  on  the  Purdue  Mechanical  Assembly  Test  and  also  gave  some  informa- 
tion on  the  relative  performance  of  the  two  sexes.  Eighty  male  and 
twenty  female  college  students  were  Btudied.  The  mean  score  for  males 
Indicated  better  performance  than  did  that  for  females  on  both  the 
Bennett  and  the  Purdue  tests . 


The  Minnesota  Paper  Form  Board  Test  was  stated  by  Likert  and 
Quasha  (115)  to  be  the  only  test  in  the  Minnesota  Mechanical  Ability 
Battery  which  gave  satisfactory  correlations  with  a criterion  of  mechan- 
ical ability.  Sex  differences  are  slight,  although  men  tend  to  excel 
women  according  to  Alteneder  (3)  nnd  Tuckman  (l8l).  In  a study  of 
1008  Pratt  Art  School  freshmen,  Bryan  (40)  found  the  mean  to  be  the  same 
(44.5)  for  both  men  and  women,  though  the  standard  deviation  was  smaller 
(9.4)  for  the  men  than  for  the  women  (12.4). 

In  a study  by  Stephens  (165)  based  on  mid-year  measurements  of 
1797  female  and  1139  male  seniors  in  Hew  England  high  schools,  the  males 
scored  consistently  higher  than  the  females  on  the  Minnesota  Paper  Form 
Board  Test,  although  the  difference  was  very  small.  In  a study  of  25 
men  and  25  women  college  students  Bates,  Wallace,  and  Henderson  ( 13 ) 
found  no  large  sex  differences  in  performance  on  this  test. 

C.  Tests  of  Intelligence 

It  is  not  the  purpose  of  this  report  to  deal  with  the  extensive 
literature  that  has  accumulated  in  the  laBt  quarter  of  a century  on  the 
question  of  sex  differences  in  intelligence.  Attitudes  have  changed, 
opinions  are  today  less  dogmatically  expressed,  and  as  Kuznets  and 
McNemar  (106)  have  stated,  when  large  unselected  groups  are  used,  age  is 
taken  into  account,  and  possibilities  of  bias  in  the  test  content  are 
allowed  for,  startling  differences  in  average  tendency  or  in  variation 
simply  fail  to  emerge. 
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VIII.  Implications 

As  data  have  accumulated  over  the  past  forty  years,  earlier  con- 
ceptions of  the  nature  of  sex  differences  in  ability  have  slowly  changes 
or  been  abandoned.  As  compared  with  earlier  writings  on  the  topic,  there 
is  a noticeable  tendency  in  the  recent  literature  to  attribute  me— -red 
sez  differences  more  to  differences  in  cultural  influences  than  to 
differences  in  inherited  endowment.  In  the  measurement  of  intelligence 
this  newer  viewpoint  has  had  for  some  time  a considerable  impact  on  test 
development.  The  hypothesis  that  women  may  be  considered  as  subject  to 
different  cultural  pressures  than  men  has,  however,  only  recently 
influenced  approaches  to  the  measurement  of  special  abilities  such  as 
mechanical  aptitude. 

This  lack  of  Influence  may  be  attributed  in  part  to  the  rather 
consistent  patterns  of  employment  that  have  in  the  past  prevailed  for 
men  and  *omen.  As  women  in  great  numbers  have  not  ordinarily  sought  Jobs 
which  demand  considerable  mechanical  aptitude,  the  presence  of  sex 
differences  on  measures  of  this  aptitude  has  tended  to  be  of  little  concern 
to  the  practical  psychologist.  Even  in  times  of  emergency  women  have 
been  channeled  into  tasks  in  which  their  traditionally-conceived  superi- 
ority in  manual  dexterity  or  attention  to  clerical  detail  might  be  advan- 
tageously used,  as  in  simple  assembly  work,  inspection,  power  sewing- 
machine  operation,  or  clerical  work  in  the  armed  forces. 

In  the  last  two  decades  women  have  become  a significant  p»rt  of 
the  labor  force  in  an  increasingly  wider  range  of  occupations.  However, 
the  frequency  of  their  employment,  rather  than  the  nature  of  it,  seems 
to  have  been  the  sore  striking  change.  The  poorer  performance  of  women 
on  tests  of  arithmetic  reasoning  and  mechanical  aptitude  has  merely 
served  to  reinforce  a well-conceived  and  time-honored  view  of  the  appro- 
priateness of  using  sex  as  one  basis  for  dividing  the  labor  of  the 
country.  The  arbitrariness  of  such  a view,  with  its  attendant  assump- 
tion of  sex  difference  in  endowment,  was  only  called  in  question  with 
the  advent  of  an  unusually  disastrous  war,  and  even  then  entirely  as  a 
result  of  hindsight- 


r 


A6- 


The  British,  for  instance,  selected  women  to  operate  anti-air- 
craft guns  and  equipment  and  to  serve  as  tinsmiths  and  pipe-fitters  only 
because  there  were  no  men  --liable  to  do  these  Jobs.  The  follow-up  data 
gathered  and  analyzed  indicated  that  these  and  other  jobs  hitherto  performed 
only  by  men  had  been  performed  quite  creditably  by  women. 

The  British  used  women  in  real  and  stressful  work  situations  but 
reported  little  data  that  would  permit  a comparative  analysis  of  the  rela- 
tion of  achievement  to  selection  test  scores  for  men  and  women.  This 
lack  of  information  is  partly  due  to  the  fact  that  the  British  wisely 
side-stepped  the  problem  by  making  no  attempt  to  select  women  with  exactly 
the  same  tests  and  minimum  cutting  scores  that  were  used  for  men. 

There  is  some  indication  from  the  British  studies  that,  properly 
chosen,  tests  may  be  used  with  the  same  success  with  women  as  with  men  in 
the  selection  of  personnel  for  jobs  demanding  mechanical  skills.  The 
studies  reported  by  Wickham  (191)  and  by  Vernon  and  Parry  (l86)  indicated 
that  an  assembly  test  was  a useful  supplement  to  the  Bennett  Test  of 
Mechanical  Comprehension  in  selecting  women  for  such  jobs.  Similarly, 
there  was  some  indication  from  the  British  research  that  on-the-job 
validities  of  the  tests  corresponded  closely  with  validities  from  the 
training-school  situations. 


To  date,  the  U.  S.  Air  Force  seems  to  have  made  the  greatest 
strides  in  tracing  the  relationship  between  scores  on  selection  teste, 
administered  fcc  buth  and  women,  and  performance  in  technical  schools. 

In  studying  situations  in  which  the  same  tests  were  given  to  members  of 
both  sexes  who  then  were  trained  in  the  same  setting,  it  appeared  that 
women  tended  to  obtain  higher  training  grades  than  did  men  having  the 
saas  scores  on  the  aptitude  measures.  This  finding  may  bs  interpreted 
in  several  ways,  as  Gordon  (78)  has  suggested.  Ignoring  the  possible 
sources  of  bias  in  the  criterion  measures,  it  may  be  said  that  women 
were  under -measured  by  the  selection  tests,  or  that  they  over-achieved. 
It  may  also  be  possible  that  both  hypotheses  are  true  in  part.  In  any 
case,  it  is  evident  that  a score  on  the  Airman  Classification  Battery 
has  a different  meaning  for  a woman  than  it  has  for  a man. 
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It  is  reasonable  to  argue  that  moat  selection  testa  for  technical 
schools  In  the  Air  Force  tend  to  measure  information  about,  and  familiarity 
with,  particular  technical  operations.  In  our  culture  vomen  are  less 
likely  to  be  acquainted  with  such  operations  than  are  men.  however, 
since  the  selection  variables  do  show  positive  correlations  with  the 
criterion  for  either  sex  taken  alone,  we  must  suppose  that  measures  of 
information  have,  in  fact,  some  relationship  to  aptitude,  in  that  those 
who  are  talented  tend  to  show  interest  and  hence  gain  information.  The 
problem  would  seem  to  be  that  of  Revising  a selection  procedure  which 
would  tap  aptitude  in  women  and  tan  without  placing  at  a disadvantage 
those  whose  opportunities  for  familiarity  with  certain  materials  or 
subject  matter  have  been  limited. 

It  might  be  possible  to  achieve  this  end  result  in  three  differ- 
ent ways.  In  the  first  place,  different  tests  might  be  used  with  women; 
this  method  was  employed  by  the  British  with  reasonable  success  in 
several  cases.  Second,  a study  might  be  made  of  the  regression  of  per- 
formance on  selection  test  scores  for  the  two  sexes,  as  the  Air  Force  has 
attempted  to  do.  This  might  lead  to  use  of  different  minimum  qualifying 
scores  for  women  than  for  men.  Third,  it  might  be  possible  to  devise  new 
tests  which  measure  each  particular  aptitude  in  the  same  way  for  both 
sexes;  for  instance,  the  speed  of  acquiring  a new  skill  in  a standardized 
learning  situation  might  be  a measure  that  is  nearly  independent  of 
cultural  Influences. 

The  British  and  U.  S.  Air  Force  findings  cited  in  this  report 
clearly  indicate  the  need  for  further  study  of  the  adequacy  of  present 
tests  in  selecting  women  for  Jobs  traditionally  performed  by  men.  The 
available  evidence,  while  far  from  being  definitive,  still  seems 
sufficient  to  call  into  question  the  use  of  identical  tests  and  test 
procedures  with  women  and  with  men.  With  present  tests  it  would  appst»i 
desirable  to  take  sex  into  account  when  considering  the  meaning  of  a 
given  test  score  during  the  process  of  assigning  personnel.  The  available 
data  on  test -criterion  relationships  seem  better  described  by  the  diagrams 
presented  as  Figures  2 and  3 than  by  Figure  1.  (See  pages  U and  5*) 
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One  further  point  eeeae  eminently  clear:  there  1>  a definite 
| need  for  further  follow-up  etudlee  of  men  and  women  who  (l)  take  the 

I * tests,  (2)  work  on  the  same  Jobs,  and  (3)  are  evaluated  in  the  *« 

! * ways. 
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